Inbreeding coefficient estimation with dense SNP data: comparison of strategies and application to HapMap III.
نویسندگان
چکیده
BACKGROUND/AIMS If the parents of an individual are related, it is possible for the individual to have received at 1 locus 2 identical-by-descent alleles that are copies of a single allele carried by the parents' common ancestor. The inbreeding coefficient measures the probability of this event and increases with increasing relatedness between the parents. It is traditionally computed from the observed inbreeding loops in the genealogies and its accuracy thus depends on the depth and reliability of the genealogies. With the availability of genome-wide genetic data, it has become possible to compute a genome-based inbreeding coefficient f, and different methods have been developed to estimate f and identify inbred individuals in a sample from the observed patterns of homozygosity at markers. METHODS For this paper, we performed simulations with known genealogies using different SNP panels with different levels of linkage disequilibrium (LD) to compare several estimators of f, including single-point estimates, methods based on the length of runs of homozygosity (ROHs) and different methods that use hidden Markov models (HMMs). We also compared the performances of some of these estimators to identify inbred individuals in a sample using either HMM likelihood ratio tests or an adapted version of the ERSA software. RESULTS Single-point methods were found to have higher standard deviations than other methods. ROHs gave the best estimates provided the correct length threshold is known. HMMs on sparse data gave equivalent or better results than HMMs modeling LD. Provided LD is correctly accounted for, the inbreeding estimates were very similar using the different SNP panels. The HMM likelihood ratio tests were found to perform better at detecting inbred individuals in a sample than the adapted ERSA. All methods accurately detected inbreeding up to second-cousin offspring. We applied the best method on release 3 of the HapMap phase III project, found up to 4% of inbred individuals, and created HAP1067, an unrelated and outbred dataset of this release. CONCLUSIONS We recommend using HMMs on multiple sparse maps to estimate and detect inbreeding in large samples. If the sample of individuals is too small to estimate allele frequencies, we advise to estimate them on reference panels or to use 1,500-kb ROHs. Finally, we suggest to investigators using HapMap to be careful with inbred individuals, especially in the GIH (Gujarati Indians from Houston in Texas) population.
منابع مشابه
On the application of estimation of distribution algorithms to multi-marker tagging SNP selection
This paper presents an algorithm for the automatic selection of a minimal subset of tagging single nucleotide polymorphisms (SNPs) using an estimation of distribution algorithm (EDA). The EDA stochastically searches the constrained space of possible feasible solutions and takes advantage of the underlying topological structure defined by the SNP correlations to model the problem interactions. T...
متن کاملRun of Homozygosity a Procedure to Detecting Inbreeding in Farm Animals
Inbreeding depression is a harmful phenomenon in livestock which is outcome of inbreeding. Inbreeding is consequence mating between two individuals who are more related to each other than average relatedness in population, which results in reducing in fitness of progenies and genetic variability in populations. Development of high-density genome-wide single nucleotide polymorphism (SNP) array f...
متن کاملInbreeding and Inbreeding Depression on Body Weight in Iranian Shal Sheep
The aim of this study was to estimate amount of inbreeding coefficient in Shal sheep and its impact on growth performance. Pedigree information and body weight at different ages (birth weight, 3 month weight, 6 month weight, 9 month weight and 12 month weight) were used from 6692 lambs from 90 rams and 1007 ewes. Data were collected on Ghazvin sheep breeding station during 1997-2013. Estimation...
متن کاملA comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data
BACKGROUND Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advan...
متن کاملInbreeding depression across the lifespan in a wild mammal population.
Inbreeding depression is of major concern for the conservation of threatened species, and inbreeding avoidance is thought to be a key driver in the evolution of mating systems. However, the estimation of individual inbreeding coefficients in natural populations has been challenging, and, consequently, the full effect of inbreeding on fitness remains unclear. Genomic inbreeding coefficients may ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Human heredity
دوره 77 1-4 شماره
صفحات -
تاریخ انتشار 2014